Improved spontaneous Mandarin speech recognition by disfluency interruption point (IP) detection using prosodic features
نویسندگان
چکیده
In this paper, a new approach for improved spontaneous Mandarin speech recognition with disfluencies well considered is presented. The basic idea is to detect the disfluency interruption points (IPs) prior to the recognition, and then to use these information during rescoring in the recognition process. For accurate detection of disfluency interruption points (IPs), a whole set of new features were proposed and tested by carefully considering the special characteristics of Mandarin Chinese. A new approach of incorporating the decision tree into the maximum entropy model training was also developed to enhance the IP detection accuracy. Experimental results indicated that the proposed set of features and the IP detection approach were very useful, and the obtained information about disfluency actually benefited the speech recognition performance.
منابع مشابه
Spontaneous Mandarin Speech Recognition with Disfluencies Detected by Latent Prosodic Modeling (LPM)
In this paper, a new approach for improved spontaneous Mandarin speech recognition using Latent Prosodic Modeling (LPM) for disfluency interruption point (IP) detection is presented. The basic idea is to detect the disfluency interruption points (IPs) prior to the recognition, and then to incorporate these information into the recognition process via the second pass rescoring. For accurate dete...
متن کاملImportant and new features with analysis for disfluency interruption point (IP) detection in spontaneous Mandarin speech
This paper presents a whole set of new features, some duration-related and some pitch-related, to be used in disfluency interruption point (IP) detection for spontaneous Mandarin speech, considering the special linguistic characteristics of Mandarin Chinese. Decision tree is incorporated into the maximum entropy model to perform the IP detection. By examining performance degradation when each s...
متن کاملLatent prosodic modeling (LPM) for speech with applications in recognizing spontaneous Mandarin speech with disfluencies
In this paper, a new approach of Latent Prosodic Modeling (LPM) for analyzing the prosody of speech is presented. Based on a set of newly defined prosodic characters, prosodic terms, documents, and the Probabilistic Latent Semantic Analysis (PLSA) framework, prosody can be modeled using a set of prosodic states representing various latent factors such as speakers, speaking rate, utterance modal...
متن کاملAutomatic disfluency identification in conversational speech using multiple knowledge sources
Disfluencies occur frequently in spontaneous speech. Detection and correction of disfluencies can make automatic speech recognition transcripts more readable for human readers, and can aid downstream processing by machine. This work investigates a number of knowledge sources for disfluency detection, including acoustic-prosodic features, a language model (LM) to account for repetition patterns,...
متن کاملA Cross-language Study on Automatic Speech Disfluency Detection
We investigate two systems for automatic disfluency detection on English and Mandarin conversational speech data. The first system combines various lexical and prosodic features in a Conditional Random Field model for detecting edit disfluencies. The second system combines acoustic and language model scores for detecting filled pauses through constrained speech recognition. We compare the contr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005